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Abstract While a couple of impressive quantum technologies have been proposed, they have several in- 
trinsic limitations which must be considered by circuit designers to produce realizable circuits. Limited 
interaction distance between gate qubits is one of the most common limitations. In this paper, we suggest 
extensions of the existing synthesis flow aimed to realize circuits for quantum architectures with linear 
nearest neighbor (LNN) interaction. To this end, a template matching optimization, an exact synthesis ap- 
proach, and two reordering strategies are introduced. The proposed methods are combined as an integrated 
synthesis flow. Experiments show that by using the suggested flow, quantum cost can be improved by more 
than 50% on average. 



1 Introduction 

Since the invention of the integrated circuit in 1958, the number of transistors in such circuits has doubled 
approximately every two years (also known as Moore's Law). Currently, semiconductor technology has 
advanced the world towards more powerful systems by decreasing the transistor size. However, further 
miniaturization is beginning to appear insoluble due to the density of power dissipation and the impossibility 
to realize patterning features approaching the atomic scale. 

The difficult barriers to the ongoing improvements in semiconductor technology have intensified the 
attraction of alternative computing paradigms such as quantum computing. It has been shown that quantum 
computing could improve the rate of advance in processing power at least for several applications pQ. In 
principle, there are several problems that cannot be executed on a classical Turing machine as efficiently 
as on a quantum computer. Quantum computers would provide exponential speedups on several problems 
including factoring of numbers and simulating the quantum- mechanical behavior of physical systems [2]. 
However, several obstacles exist in the way of physically implementing scalable quantum computers. 

While several impressive physical realizations have been proposed for quantum computers (see [3] for 
a classification scheme of different quantum computing technologies) , all of these technologies have serious 
intrinsic limitations [J]. Among the different technological constraints, limited interaction distance between 
gate qubits is one of the most common ones. Although arbitrary-distance interaction between qubits is 
possible in quantum computer technologies with moving qubits (for example in a photon-based system [5]), 
restrictions exist in other quantum technologies. In fact, many physical quantum computer proposals only 
permit interactions between adjacent (nearest neighbor) qubits [6]. For example, trapped ions (e.g., [7]), 
liquid nuclear magnetic resonance (NMR) (e.g., [S]), and the original Kane model [5] have been designed 
based on the interactions between linear nearest neighbor (LNN) qubits. The LNN architecture is often 
considered as an appropriate approximation to a scalable quantum architecture. If one can show that a 
circuit can efficiently be realized using an LNN architecture, it can be run in many other architectures as 
well [TO]. 
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The efficient realization of a given quantum algorithm for the LNN architectures is an active research 
area. In the recent years, the effect of restricted interactions on several specific quantum algorithms has 
been studied. For example, the physical implementation of the quantum Fourier transformation (QFT) 
Shor's factorization algorithm [6lll2]. quantum addition [13], and quantum error correction Q3] for the 
LNN architectures have been explored in the past. Besides that, researchers also considered the effects of 
LNN architectures on the synthesis of general quantum/reversible circuits. In [15] and |16| . the worst-case 
synthesis cost of a general unitary matrix under the nearest neighbor restriction has been discussed. It 
has been shown that restricting CNOT gates to nearest neighbor interactions increases CNOT count of 
[T6] by at most a factor of 9. The authors of [TT] showed that translating an arbitrary circuit to the LNN 
architectures requires a linear increase in the quantum cost with respect to the number of qubits. In [1811191 
120] . heuristic methods for converting an arbitrary circuit to its equivalent on the LNN architectures have 
been proposed. However, their performance is limited as discussed later. 

In this paper, we suggest extensions of the existing synthesis flow aimed to realize circuits for LNN 
architectures. We show that with a naive treatment of the LNN restriction, quantum circuits require up 
to one order of magnitude higher quantum cost in the LNN architectures. In contrast, if this restriction 
is explicitly considered by the proposed synthesis flow, this increase can be reduced by more than 50% on 
average (83% in the best case). To this end, the following approaches are proposed: 

— An improved template-matching post-synthesis optimization method that reduces the circuit cost for 
LNN architectures, 

— an exact synthesis method for small functions realizing circuits with nearest neighbor interaction, and 

— reordering strategies, which modify the initial qubit locations in order to reduce the distance between 
non-neighbored qubits. 

The remainder of this paper is organized as follows. In Section [2] basic concepts are introduced. Next, 
we briefly review the naive synthesis flow for LNN architectures in Section [3] Followed by this, Section [3] 
describes the proposed synthesis and optimization approaches with explicit consideration of the LNN lim- 
itation in detail. How to combine the respective approaches as an integrated flow is sketched in Section [5] 
Finally, experimental results are given in Section [6] and conclusions are drawn in Section respectively. 

2 Background 

2.1 Reversible Logic 

A function / : B™ — > M n over variables X = {x±, . . . , x n } is reversible if it maps each input assignment to 
a unique output assignment. Such function must have the same number of input and output variables. In 
this paper, n is particularly used to refer to the number of inputs/outputs. A circuit realizing a reversible 
function is a cascade of reversible gates. Common reversible gates include: 

— A multiple control Toffoli gate t m has the form t m (C,t), where C = {xi 1} . . . ,Xi m } C X is the set of 
control lines and t = {xj } with C (~l t = is the target line. The value of the target line is inverted iff 
all control lines are assigned to 1. For m=0 and m=l, the gates are called NOT gate and CNOT gate, 
respectively. For m=2, the gate is called C 2 NOT gate or Toffoli gate. 

— A multiple control Fredkin gate f m has two target lines and m control lines. The gate interchanges the 
values of the target lines iff the conjunction of all m control lines evaluates to 1. For m=0, the gate is 
called SWAP gate. 

— A Peres gate P has one control line Xi as well as two target lines Xj 1 and xj 2 . It represents a t2({xi, }, Xj 2 ) 
and a ti({xi}, Xj 1 ) in a cascade. 

Reversible logic has applications in various fields including quantum computation. 
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(a) A given circuit (b) The decomposed circuit with elementary gates 

Fig. 1 A reversible circuit and its decomposed circuit 

2.2 Quantum Logic 

A quantum bit, qubit in short, can be realized by a physical system such as a photon. Each qubit has two 
basic states |0) as well as |1) and can get any linear combination of its basic states (called superposition, as 
shown in ([1} where a and ft are complex numbers). 



\i>) = a\0) + 



(1) 



An n-qubit quantum gate is a device which performs a specific 2™ x 2" unitary operation on selected n 
qubits in a specific period of time. A matrix U is unitary if UW = I where is the conjugate transpose 
of U and / is the identity matrix. Previously, various quantum gates with different functionalities have 
been introduced. For examples, Hadamard (H), Controlled-V, and Controlled-V + gates are defined by the 
following unitary matrices: 
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2.3 Synthesis Cost 

Each Toffoli, Fredkin, and Peres gate can be decomposed into a quantum circuit composed of a sequence 
of elementary quantum gates pQ. Each elementary gate performs a single physical operation in a certain 
quantum computing technology. The number of elementary gates required to realize a given reversible 
gate is called quantum cost. It has been shown that NOT, CNOT, Controlled-V, and Controlled- V + gates 
can efficiently be realized in quantum computer technologies [21]. These gates are usually considered as 
elementary gates for reversible Boolean functions [22]. Thus, we stay with this definition in the following 
sections. However, in other technologies not only this restricted set, but all one-qubit gates and all two-qubit 
gates, respectively, are considered as elementary gates pQ . This is separately considered in the experimental 
evaluation of the proposed approach in Section [6] 



Fig. 1(a) shows a Toffoli gate and a Fredkin gate in a cascade. The resulting (decomposed) quantum 
circuit is depicted in Fig. 1(b) Here, the control lines are denoted by • while the target lines are denoted 
by ©, x , a V box, or a V + box, respectively. As can be seen, a t^ gate is decomposed into 5 elemantary 
gates, while a f m gate is decomposed into 7 elemenary gates, respectively. For larger gates, the respective 
decomposition depends on the number n — m of unused circuit lines: For n > 5 and m G {3, 4, • • • [n/2]}, a 
t m gate can be decomposed into a linear-size circuit which contains 12m — 22 elementary gates. In addition, 
for n > 7, a t n -i gate can be decomposed into 24n — 88 elementary gates with no auxiliary bits [23] . Finally, 
a tn-i gate can be decomposed into 2 n — 3 elementary gates if no unused circuit line is available [22]. The 
cost of a fm gate (1 < m < n — 2) is the cost of a t m +i gate plus two pp. Obviously, always the most 
efficient decomposition is applied. 
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3 The Naive Synthesis Flow for the LNN Architectures 

Reversible circuits can be synthesized using multiple control Toffoli gates first that are afterwards mapped 
to elementary quantum gates. On the other hand, elementary gates can be directly applied during the 
synthesis process. While for the latter case, only small circuits have been determined so far (e.g., see [241 
125]). approaches for Toffoli network synthesis can handle larger functions and circuits (e.g., see [2611271 
I28|[29l 30,31,32 _). However, both approaches often lead to sub-optimal circuits with respect to the LNN 
architectures since the number of elementary gates (i.e., quantum cost) are improved without an explicit 
consideration of the LNN restriction. The same problem exists for quantum circuit synthesis algorithms [151 

EE]. 

In order to measure the cost of the LNN restriction, a cost metric is defined. Consider a 2-qubit quantum 
gate g where its control and target are placed at the c th line and at the t th line (0 < c, t < n), respectively. 
The NNC (nearest neighbor cost) of g is defined as |c — t — 1| (i.e., distance between control and target 
lines) . The NNC of a circuit is defined as the sum of the NNCs of its gates. Optimal NNC for a circuit is 
where all quantum gates are either 1-qubit or 2-qubit gates performed on adjacent qubits. 

Since synthesis algorithms may use several non-elementary gates during synthesis, all non-elementary 
gates should be decomposed into a set of elementary unit-cost gates for physical implementation. Decom- 
position methods proposed in |22II231[16] are extensively used for this purpose. On the other hand, after 
applying one of the available synthesis and/or decomposition algorithms, non-optimal circuits with respect 
to NNC may result. For example, Fig. [2] (a) shows the standard decomposition of a Toffoli gate which leads 
to an NNC value of 1. To make this circuit applicable for the LNN architectures, SWAP gates must be 
applied for each non-adjacent quantum gate. More precisely, SWAP gates are added in front of each gate 
g with non-adjacent control and target lines to "move" the control (target) line of g towards the target 
(control) line until they become adjacent. Afterwards, SWAP gates are added to restore the original order- 
ing of circuit lines. Similar methods have been applied by previous synthesis methods considering the LNN 
restriction [I3E3EHlEilIB[33] . 

Example 1 Consider the standard decomposition of a Toffoli gate as depicted in Fig. [2] (a). As can be seen, 
the first gate is non-adjacent. Thus, to achieve NNC-optimality, SWAP gates in front and after the first gate 
are inserted (see Fig. [2](b)). Since each SWAP gate requires 3 elementary quantum gatefl this increases 
the total quantum cost to 11, but leads to an NNC value of 0. 

By inserting SWAP gates consecutively for each non-adjacent gate, a quantum circuit with NNC of 
(and thus applicable to LNN architectures) can be determined in linear time. This method is denoted 
by naive NNC-based decomposition in the rest of this paper. However, as can easily be seen, synthesizing 
quantum circuits for LNN architectures using this method (or similar approaches like [20 , 15,18,19,16 ) often 
leads to a significant increase in the quantum cost. In contrast, often smaller realizations (with NNC of 0) 
are possible. As an example, consider Fig. [2] (c) that shows an NNC-optimal decomposition with quantum 
cost of 9 (instead of 11). In the next sections, a synthesis flow is described that explicitly takes NNC into 
account. Hence, better quantum circuit realizations for the LNN architectures can be found as shown in the 
experimental results section. 

4 Explicit Consideration of NNC 

In this section, we propose new synthesis and optimization approaches that explicitly take NNC into ac- 
count. More precisely, a template-matching post-optimization algorithm is introduced to simplify the circuits 
resulted from the existing synthesis flow. Furthermore, an exact synthesis approach is proposed that de- 
termines NNC-optimal circuits with minimal quantum cost. The resulting circuits can later be exploited 
to optimize large circuits. Finally, two heuristic approaches are introduced that modify the initial qubit 
locations in order to remove unnecessary SWAP gates and therewith to reduce the cost. 

4.1 NNC-based Template Matching 

The idea of exploiting templates has originally been proposed in [33] and extended in [35] for LNN architec- 
tures. In this section, further templates for LNN architectures are proposed that outperform the previous 
ones as shown below. 

1 As mentioned above, in certain quantum technologies all two-qubit gates are considered as elementary gates. Thus, in 
this case the SWAP gate is seen as an elementary gate increasing the costs by 1, instead of 3. While this special case is not 
considered in the following, it is separately evaluated in the experimental evaluation in Section \E\ 
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Fig. 2 Different decompositions of a Toffoli gate 
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(c) With three SWAP gates 



Fig. 3 Proposed templates 



Two neighboring gates can be interchanged if the target line of the first gate is not equal to the control 
lines of the second gate and vice versa (moving rule). In addition, two neighboring SWAP gates with the 
same target lines can be removed (deletion rule). The general idea of template matching is to replace a 
cascade of reversible gates by a different cascade with the same functionality and afterwards applying the 
moving and deletion rules to optimize the circuit. By considering this approach, templates with one, two, 
and three SWAP gates are proposed in Fig. 3(a) Fig. |3(b)"| and Fig. 3(c) respectively. The Ui boxes thereby 
represent any one-qubit or two-qubit gate. A Xjf' box represents the same gate as a Ui box, but probably 
with interchanged control and target lines. 

As an example, consider the circuit shown in Fig. 4(a) with quantum cost of 16. By applying a template 
introduced in Fig. |3(b)| the circuit shown in Fig. 4(b) results. Now, a 1-SWAP template (Fig. |3(a)[ ) can 
be applied leading to the circuit depicted in Fig. |4(c) Finally, by applying the deletion rule, gates can be 
removed and, the final quantum cost is improved by about 37%. The final circuit is shown in Fig. |4(d)| 

The authors of [35] introduced a set of nearest neighbor templates for Toffoli and CNOT combinations. It 
can be verified that the introduced templates in Fig. 6(b) and Fig. 6(c) of [35] can be found by applying the 
deletion rule. Moreover, consider the circuit shown in Fig. |5 (a)] which includes a Toffoli-CNOT combination. 
Fig. |5(b)] illustrates a template as proposed in Fig. 6(a) of [35]. This circuit still has to be decomposed to 



elementary gates leading to a circuit with quantum cost of 30 as shown in Fig. 5(c) On the other hand, 
consider the circuit shown in Fig. |5(d)| obtained by applying the naive method on the circuit of Fig. 5(a) 
The equivalent circuit after applying the templates introduced in Fig. |3(b)| is given in Fig. 5(e) Applying 
the deletion rule finally leads to a circuit with quantum cost of 24 as shown in Fig. |5(f)| Thus, applying the 
templates proposed in this paper in conjunction with the deletion rule improves the result of |35j by 20%. 

Besides that, the efficiency of the proposed templates is illustrated by the following practical relevant 
example. 



M. Saeedi, R. Wille, R. Drcchslcr 



-4> 



b x x 



c — * * 



¥ b 



(a) A circuit with quantum cost of 16 (b) Applying 2-SWAP template 
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Fig. 4 Application of the proposed NNC-based templates 



(d) Final circuit with quantum cost of 10 
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(c) The circuit of Fig, |5(b)| after decomposition (quantum cost of 30) 
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(d) The circuit of Fig, |5(a)j after applying the naive method (quantum cost of 42) 
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(e) The circuit of Fig, |5(d)"| after applying 2-SWAP templates 
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(f) The resulting simplified circuit (quantum cost of 24) 
Fig. 5 An existing nearest neighbor template for a Toffoli-CNOT combination and our proposed template 
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Fig. 6 Circuits realizing the Approximate Quantum Fourier Transform (AQFT) 



Example 2 Consider the circuit shown in Fig. 6(a) which is the approximate quantum Fourier transform 
circuit (AQFT) [36] with 36 SWAP gates obtained by the method of [11] for 8 qubits and an approximation 
parameter of 5. Note that Rk is the rotation by 2n/2 k and H is the Hadamard gate. Fig. 6(b) is an equivalent 
circuit with 24 SWAP gates constructed by a method recently introduced in [5D]. On the other hand, applying 



the proposed templates on the result of 1 1 1 j leads to the circuit with 20 SWAP gates illustrated in Fig. 6(c^ 



4.2 Exploiting Exact Synthesis 



A few exact synthesis methods for quantum circuits have recently been introduced. They generate quantum 
circuits with minimal quantum cost (for examples see [241l25j V However, no approach to determine optimal 
circuits for LNN architectures has been proposed so far. In this section, an exact synthesis algorithm is 
proposed to construct quantum circuits with both, minimal quantum cost and minimal NNC. 

The developed approach is similar to the one introduced in [25] . Here, the synthesis problem is expressed 
as a sequence of Boolean satisfiability (SAT) instances. For a given function /, it is checked if a circuit with c 
gates realizing / exists. Thereby, c is initially assigned to 1 and increased in each iteration if no realization 
is found. 

More formally, for a given c and a reversible function / : W 1 — > B™, the following SAT instance is created: 

2 n -l 

£ A /\ {[inp l } 2 = i A [ouU] 2 = f(i)), 
i=0 

where 

— in]3 i is a Boolean vector representing the inputs of the network to be synthesized for truth table line i, 

— outi is a Boolean vector representing the outputs of the network to be synthesized for truth table line i, 
and 
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Table 1 List of available macros 







Cost 




n 


Macro 
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P({a,c},b), P({c,a},b) 
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P({a,b},d), P({d,c},a) 


30 
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t2({a,b},c), t2({c,b},a) 


11 
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t2({d,b},a), t2({a,c},d) 
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Fig. 7 Circuit of Example [3] 



— 4> is a set of constraints representing the synthesis problem for a given gate library. 

The difference in comparison to [25] is that the constraints in <P do not represent the whole set of elementary 
quantum gates and a restricted gate library with only adjacent gates is applied. 

Although solving the generated SAT instances using a modern SAT solver can produce optimized cir- 
cuits, the applicability of the exact method is limited to functions with a small number of qubits and gates 
due to the exponential search space. Actually, the proposed exact method is sufficient to construct minimal 
realizations with respect to both quantum cost and NNC for a set of Toffoli and Peres gate configurations 
as shown in Table [1] However, these optimal circuits can be exploited to improve the naive NNC-based de- 
composition method. More precisely, once an exact NNC-optimal quantum circuit for a function is available 
(denoted by macro in the following) , the decomposition from the naive approach is replaced by the optimal 
circuit. The following example illustrates the idea. 

Example 3 Reconsider the decomposition of a Toffoli gate as depicted in Fig. [2] By applying the proposed 
exact synthesis approach, an NNC-optimal quantum circuit as shown in Fig. [2jc) results. In comparison 
to the naive method (see Fig. [DJb)), this reduces the quantum cost from 11 to 9 while still ensuring NNC 
optimality. 

After finding the optimal decomposition of a given gate, it can be used as a macro to simplify other 
circuits. For example, consider the circuit shown in Fig. [7] Here, for the second gate the naive method is 
applied and SWAPs are added, while for the remaining ones the obtained macro is used. This enables a 
quantum cost reduction from 96 to 92. 

Moreover, Fig. |8(b)] and Fig. 8(c) show the NNC-optimal circuit of the Peres gate obtained by the naive 



and by the exact approach, respectively. As illustrated, applying the naive approach leads to quantum cost 
of 28 while the optimal circuit has only quantum cost of 11. 

In total, we generated 13 macros as listed in Table [1] together with the respective costs in comparison 
to the costs obtained by using the naive method. As can be seen, exploiting these macros reduces the cost 
for each gate by up to 63%. The effect of these macros on the decomposition of larger circuits is considered 
in the experimental results section in detail. 



4.3 Reordering Circuit Lines 

Applying the approaches introduced so far leads to an increase in the quantum cost for each non-adjacent 
gate. In contrast, by modifying the ordering of the circuit lines, some of the additional costs can be saved. 



As an example, consider the circuit in Fig. 9(a) with quantum cost 3 and an NNC value of 6. By reordering 
the lines as shown in Fig. [9(b)] the NNC value can be reduced to 1 without increasing the total quantum 
cost. It is worth noting that manipulating the line order has been previously done to reduce the quantum 
cost (e.g., in [26 37 :. To determine which lines should be reordered, two heuristic methods are proposed in 
the following. The former one changes the ordering of the primary inputs and outputs according to a global 
view while the latter one applies a local view to assign the line ordering. 
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(a) Original circuit 
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(b) Circuit obtained by the naive method 
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(c) Circuit obtained by exploiting exact synthesis 
Fig. 8 NNC-based synthesis of a Peres gate 
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Fig. 9 Reordering circuit lines 



4-3.1 Global Reordering 

After applying the standard decomposition algorithms [221123] . a cascade of 1- and 2-qubit gates is gener- 
ated. Now, an ordering of the circuit lines which reduces the total NNC value is desired. To do that, the 
"contribution" of each line to the total NNC value is calculated. More precisely, for each gate g with control 
line i and target line j, the NNC value is calculated. This value is added to variables impi and impj which 
are used to save the impacts of the circuit lines i and j on the total NNC value, respectively. Next, the line 
with the highest NNC impact is chosen for reordering and placed at the middle line (i.e., swapped with the 
middle line) . If the selected line is the middle line itself, a line with the next highest impact is selected. This 
procedure is repeated until no better NNC value is achieved. Finally, SWAP operations as described in the 
previous sections are added for each non-adjacent gate. The following example illustrates the idea. 



Example 4 Consider the circuit depicted in Fig. 10(a) After calculating the NNC contributions, we have 
impa = 1-5, impb = 0, imp c = 0.5, and irapd = 1, respectively. Thus, lines a (highest impact) and c (middle 
line) are swapped. Since further swapping does not improve the NNC value, reordering terminates and 



SWAP gates are added for the remaining non-adjacent gates. The resulting circuit is depicted in Fig. 10(b) 
and has quantum cost of 9 in comparison to 21 that results if the naive method is applied. 



4-3.2 Local Reordering 

In order to save SWAP gates, line ordering can also be applied according to a local schema as follows. The 
circuit is traversed from the inputs to the outputs. As soon as there is a gate g with an NNC value greater 
than 0, a SWAP operation is added in front of g to enable an adjacent gate. However, in contrast to the 
naive NNC-based decomposition, no SWAP operation is added after g. Instead, the resulting ordering is 
used for the rest of the circuit (i.e., propagated through the remaining circuit). This process is repeated 
until all gates are traversed. 



10 



M. Sacedi, R. Wille, R. Drechslcr 



— € 


HI 


r 










i 


5 — 



(a) Original circuit 
Fig. 10 Global and local reordering 
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Example 5 Reconsider the circuit depicted in Fig. 10(a) The first gate is not modified since it has an NNC 
of 0. For the second gate, a SWAP operation is applied to make it adjacent. Afterwards, the new line 
ordering is propagated to all remaining gates resulting in the circuit shown in Fig. 10(c) This procedure is 
repeated until the whole circuit has been traversed. Finally, a circuit with quantum cost of 9 (in contrast 
to 21) is produced. 



5 A Synthesis Flow for LNN Architectures 



Having the proposed approaches from the previous section available, they can be combined to an extended 
synthesis flow that explicitly takes the LNN limitation into account. Fig. [11] illustrates this flow. As shown 
in this figure, first an off-the-shelf synthesis approach is applied to create an initial circuit realization. 
Afterwards, if macro replacement is enabled, the proposed macro replacement method from Section 1-1.21 is 
applied to simplify the circuit (lines 2-3). Then, one of the available standard decomposition methods is 
applied to decompose all non-elementary gates into a set of elementary unit-cost gates (line 4) . The resulting 
quantum circuit can be optimized by the reordering methods proposed in Section 14.31 (lines 5-8) . Finally, 
SWAP gates for the remaining non-adjacent gates have to be added (line 9) and template matching as 
introduced in Section [3TT] can additionally be applied (lines 10-11). Note that each method is applied on the 
result of the previous method in the proposed synthesis flow. It can be verified that for the naive method, 
only lines 1, 4, and 9 are executed. 





input: a given reversible or quantum specification 




output: a synthesized circuit for LNN architectures 


1. 


synthesize the given specification using an appropriate synthesis method 


2. 


if macro replacement is enabled 


3. 


apply the available macros 


4. 


decompose each gate into a set of elementary gates 


5. 


if global reordering is enabled 


6. 


reorder initial qubit locations based on the global reordering method 


7. 


if local reordering is enabled 


8. 


reorder initial qubit locations based on the local reordering method 


9. 


insert a set of SWAP gates for each non-adjacent gate 


10. 


if template matching is enabled 


11. 


apply the available templates 



Fig. 11 The extended synthesis flow 



6 Experimental Results 

In this section, experimental results are presented. We evaluated the methods introduced in Section [3] and 
compared them to the naive approach, which has been used by other synthesis methods [20, 15, 18, 19, 16,33 
so far. All approaches have been implemented in C-l — h and applied to the benchmark collection available at 
RevLib [38] including a wide variety of circuits that already have been used by other researchers to evaluate 
previous reversible synthesis approaches. The experiments have been carried out on an Intel Pentium IV 
2.2GHz computer with 2GB memory. 
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The results are shown in Tableland Table [3l respectively. The former table shows the results obtained 
by applying the established decomposition, where a SWAP gate is composed of three elementary gates. 
Additionally, Table [3] shows the results obtained by assuming the SWAP gate itself to be an elementary 
gate (as done by certain quantum technologies (T]). The first column gives thereby the names of the circuits 
followed by unique identifiers as used in RevLib. Then, the number of circuit lines (n), the gate count (gc), 
the quantum cost (gc), and the NNC value of the original reversible circuits are shown. The following 
columns denote the quantum cost of the NNC-optimal circuits obtained by the naive method (N) as well 
as by the proposed synthesis flow where a combination of macro replacement (M), global reordering (G), 
local reordering (L), and template matching (T) methods has been applied. For example, MT denotes the 
results obtained by the proposed flow with macro replacement and template matching methods enabled (i.e., 
reordering methods disabled). Besides that, the results of the best configurations are given in column Best 
config. The percentages of the best quantum cost reduction obtained by the extended synthesis flow in 
comparison to the widely used naive method are reported in Column Best Impr.. Column Time denotes 
the overall run-time needed to generate the results for all possible configurations (with and without any 
possible options). Finally, the last column shows the remaining overhead in terms of quantum cost needed 
to achieve NNC-optimality in comparison to the original circuit (Ohead). 

As can be seen, decomposing reversible circuits to have NNC-optimal quantum circuits for LNN archi- 
tectures is costly. Using the widely used naive method, the quantum cost increases significantly. This result 
has been obtained in recent synthesis papersas well [171I161I33] , However, using the proposed methods, this 
can be improved. Even if reordering may worsen the results in some few cases, in total this leads to an 
improvement. The results have been obtained in negligible run-time (i.e., in less than one CPU second). 
Only if template matching was enabled more run-time was needed. 

Overall, reductions of more than 50% on average - in the best case of 83% - have been observed 
considering the established decomposition (see Table [2]). Similar results are obtained applying the extended 
definition of elementary gates (see Table [3J. As a result, NNC-optimal circuits can be synthesized with a 
moderate increase of quantum cost. 

7 Conclusions 

Quantum technologies are in preliminarily state and several limitations should be resolved to have a scalable 
quantum technology. Limited interaction distance between gate qubits is one of the most common limitations 
of the current technologies. In this paper, we illustrated how the synthesis flow can be modified to produce 
efficient circuits for quantum technologies with limited interactions. The proposed flow includes a set of NNC- 
based decomposition methods equipped by an NNC-based template matching algorithm. The experiments 
show that with a naive treatment of the LNN restriction, quantum circuits require up to one order of 
magnitude higher quantum cost in the LNN architectures. In contrast, using the suggested methods, this 
increase can be reduced by more than 50% on average (83% in the best case) . 
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Table 2 Experimental results (considering a SWAP gate to be composed out of 3 elementary gates) 







Original circuit 










Decomposed circuit 














CIRCUIT 












N 


T 


M 


MT 


G 


L 


GL 


Best 


Best 


Time 


Ohead 








qc 


NNC 




qc 


qc 


qc 


qc 


qc 


qc 


qc 


qc 


Method 


Impr.% 


Sec. 




0410184.169 


14 


46 


90 


68 


14 


234 


212 


197 


189 


234 


423 


423 


189 


MT 


19 





2.10 


3_17_13 


3 


6 


14 


8 


3 


32 


32 


28 


26 


32 


32 


32 


26 


MT 


18 





1.86 


4_49_17 


4 


12 


32 


64 


4 


158 


104 


120 


102 


128 


98 


98 


92 


GT 


41 





2.88 


4gtl0-vl_81 


5 


6 


34 


164 


5 


282 


120 


282 


120 


258 


150 


147 


120 


T 


57 





3.53 


4gtll_84 


5 


3 


7 


26 


5 


49 


31 


47 


29 


25 


22 


16 


14 


MGL 


71 





2.00 


4gtl2-vl_89 


5 


5 


42 


320 


6 


525 


195 


525 


195 


321 


171 


168 


141 


GT 


73 





3.36 


4gtl3-vl_93 


5 


4 


16 


104 


5 


173 


83 


173 


83 


77 


56 


53 


47 


GT 


72 





2.94 


4gt4-v0_80 


5 


5 


34 


218 


6 


366 


168 


364 


166 


168 


138 


141 


132 


GT 


63 





3.88 


4gt5_75 


5 


5 


21 


76 


5 


142 


94 


138 


96 


118 


82 


79 


70 


GT 


50 





3.33 


4mod5-vl_23 


5 


8 


24 


90 


5 


174 


84 


155 


101 


114 


78 


78 


72 


GLT 


58 





3.00 


4mod7-v0_95 


5 


6 


38 


144 


5 


256 


140 


256 


140 


352 


127 


121 


121 


GL 


52 





3.18 


addl6_174 


49 


64 


192 


220 


49 


762 


446 


473 


473 


762 


1104 


1104 


428 


MGL 


43 





2.23 


add32_183 


97 


128 


384 


444 


97 


1530 


894 


953 


953 


1530 


3744 


3744 


860 


MGL 


43 


2 


2.24 


add64_184 


193 


256 


768 


892 


193 


3066 


1790 


1913 


1913 


3066 


13632 


13632 


1724 


MGL 


43 


14 


2.24 


add8_172 


25 


32 


96 


108 


25 


378 


222 


233 


233 


378 


360 


360 


212 


MGL 


43 





2.21 


aj-ell_165 


4 


13 


45 


144 


5 


280 


166 


260 


164 


280 


181 


181 


160 


GT 


42 





3.56 


alu-v4_36 


5 


7 


31 


136 


5 


242 


146 


238 


148 


218 


113 


104 


98 


GT 


59 





3.16 


cnt3-5_180 


16 


20 


120 


1634 


16 


2621 


613 


2591 


677 


1457 


731 


728 


511 


GT 


80 





4.26 


cyclcl0_2_110 


12 


19 


1126 


13472 


12 


21420 


13700 


21420 


13700 


21420 


8046 


8046 


7874 


LT 


63 


4 


6.99 


decod24-v3_46 


4 


9 


9 


36 


4 


63 


27 


63 


27 


39 


21 


24 


21 


L 


66 





2.33 


haml5_108 


15 


70 


453 


9978 


15 


15494 


10610 


15390 


10582 


14030 


2627 


2588 


2588 


GL 


83 


2 


5.71 


ham7_104 


7 


23 


83 


624 


7 


1035 


681 


1027 


695 


657 


342 


333 


327 


GT 


68 





3.94 


hwb4_52 


4 


11 


23 


40 


4 


107 


77 


83 


63 


107 


65 


65 


63 


MT 


41 





2.74 


hwb5_55 


5 


24 


104 


470 


5 


823 


407 


817 


415 


595 


337 


340 


335 


LT 


59 





3.22 


hwb6_58 


6 


42 


142 


710 


6 


1304 


692 


1160 


672 


1268 


614 


545 


542 


GLT 


58 





3.82 


hwb7_62 


7 


331 


2325 


16890 


8 


27967 


15547 


27869 


15533 


25939 


13390 


12955 


12853 


LT 


54 


4 


5.53 


hwb8_118 


8 


633 


14260 


115030 


9 


187272 


96906 


186880 


96834 


182196 


87495 


87498 


87495 


L 


53 


39 


6.14 


hwb9_123 


9 


1959 


18124 


189426 


10 


304659 


168147 


304540 


168160 


302481 


124068 


124041 


124041 


GL 


59 


74 


6.84 


mod5addcr_128 


6 


15 


83 


600 


6 


1011 


435 


978 


432 


675 


330 


333 


330 


L 


67 





3.98 


mod8-10_177 


5 


14 


88 


582 


6 


975 


407 


969 


409 


621 


372 


363 


oil 


VJ 1 


67 





3.60 


plusl27mod8192_162 


13 


qi n 
y±u 




ooioyo 


14 


1057946 


675624 


1057804 


D / OD1U 


iuo / y^o 


OUOOID 


503516 


496698 


LT 


53 


376 


8.65 


piUSD0lI10Q4UyD_lD0 


12 


429 


25492 


254864 


13 


4U /yzo 


ZOO (9^ 


A (1778/1 
4U ( 1 04 


256778 


407926 


210400 


Z 1U4UU 


210100 


LT 


A S 
45 


113 


8.24 


plus63mod8192_164 


13 


492 


32578 


397864 


14 


633994 


409384 


633852 


409358 


633994 


279016 


279016 


271030 


LT 


57 


187 


8.32 


rd32-v0.67 


4 


2 


10 


10 


4 


38 


26 


19 


17 


20 


32 


20 


17 


MT 


55 





1.70 


rd53.135 


7 


16 


77 


466 


7 


822 


450 


750 


456 


702 


330 


303 


303 


GL 


63 





3.94 


rd73.140 


10 


20 


76 


450 


10 


790 


400 


739 


401 


646 


304 


295 


286 


LT 


63 





3.76 


rd84_142 


15 


28 


112 


910 


15 


1516 


626 


1465 


639 


1696 


556 


586 


556 


L 


63 





4.96 


sym9.148 


10 


210 


4368 


48736 


10 


77556 


46110 


77556 


46110 


67428 


20643 


25023 


20640 


LT 


73 


11 


4.73 


sys6-v0.144 


10 


15 


67 


358 


10 


638 


326 


587 


329 


842 


263 


308 


263 


L 


58 





3.93 


urfl.149 


9 


11554 


57770 


462708 


9 


794582 


353732 


735170 


329762 


659150 


238475 


238490 


238475 


L 


69 


261 


4.13 


urf2.152 


8 


5030 


25150 


171284 


8 


297178 


133606 


276882 


126348 


297178 


101683 


101683 


101656 


LT 


65 


56 


4.04 


urf3.155 


10 


26468 


132340 


1282724 


10 


2121808 


897358 


2038584 


874848 


1933372 


596368 


596371 


596356 


LT 


71 


1298 


4.51 


urf5.158 


9 


10276 


51380 


442748 


9 


740084 


333674 


706412 


321496 


667484 


208709 


208706 


208700 


LT 


71 


231 


4.06 


urf6.160 


15 


10740 


53700 


951276 


15 


1487904 


589662 


1478080 


586572 


1334916 


320412 


320409 


320400 


LT 


78 


596 


5.97 



N: Naive method (i.e., synthesis, decomposition, SWAP insertion) T: With template matching M: With macros replacement G: With global reordering L: With local reordering 



Column Time denotes the overall run-time needed to generate the results for all possible configurations. 

Almost all results have been obtained in negligible run-time (i.e., in less than one CPU second). Only if template matching was enabled more run-time was needed. 



Table 3 Experimental results (considering a SWAP gate as an elementary gate) 
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7948 


5372 


7948 


3490 


3490 


3430 


LT 


56 


3.61 


3.05 


dccod24-v3 46 


4 


9 


9 


36 


4 


27 


15 


27 


15 


19 


13 


14 


13 


L 


51 


0.00 


1.44 


haml5_108 


15 


70 


453 


9978 


15 


5470 


3842 


5458 


3854 


4982 


1181 


1168 


1168 


GL 


78 


2.22 


2.58 


ham 7_104 


7 


23 


83 


624 


7 


403 


285 


411 


299 


277 


172 


169 


167 


GT 


58 


0.09 


2.01 


hwb4 52 


4 


11 


23 


40 


4 


51 


41 


59 


51 


51 


37 


37 


37 


L 


27 


0.02 


1.61 


hwb5 55 


5 


24 


104 


470 


5 


347 


207 


353 


219 


271 


185 


186 


183 


LT 


47 


0.10 


1.76 


hwb6_58 


Q 


42 


142 


710 




532 


328 


512 


348 


520 


302 


279 


278 


GLT 


47 


0.16 


1.96 


hwb7_62 


7 


331 


2325 


16890 


8 


11023 


6883 


11033 


6921 


10347 


6164 


6019 


5985 


LT 


45 


3.90 


2.57 


hwb8_118 


8 


633 


14260 


115030 


9 


72060 


41938 


72032 


42014 


70368 


38801 


38802 


38801 


L 


46 


39.05 


2.72 


hwb9_123 


9 


1959 


18124 


189426 


10 


115167 


69663 


115180 


69720 


114441 


54970 


54961 


54961 


GL 


52 


73.77 


3 03 


mod5addcr 128 




15 


83 


600 




395 


203 


394 


212 


283 


168 


169 


168 


L 


57 


0.08 


2.02 


mod8-10_177 


5 


14 


88 


582 


6 


387 


195 


393 


205 


269 


186 


183 


165 


GT 


57 


0.09 


1.88 


plusl27mod8192_162 


13 


910 


57400 


661596 


14 


396286 


268836 


396272 


268866 


396286 


211476 


211476 


209194 


LT 


47 


269.17 


3.64 


plus63mod4096_163 


12 


429 


25492 


254864 


13 


152998 


102612 


152984 


102642 


152998 


87156 


87156 


87048 


LT 


43 


60.56 


3.41 


plus63mod8192_164 


13 


492 


32578 


397864 


14 


236066 


161188 


236052 


161214 


236066 


117740 


117740 


115070 


LT 


51 


132.34 


3.53 


rd32-v0_67 


4 


2 


10 


10 


4 


18 


14 


19 


17 


12 


16 


12 


12 


G 


33 


0.02 


1.20 


rd53_135 


7 


16 


77 


466 


7 


326 


202 


314 


216 


286 


162 


153 


153 


GL 


53 


0.11 


1.99 


rd73_140 


10 


20 


76 


450 


10 


314 


176 


315 


197 


266 


152 


149 


138 


LT 


56 


0.08 


1.82 


rd84_142 


15 


28 


112 


910 


15 


580 


274 


581 


299 


640 


260 


270 


260 


L 


55 


0.16 


2.32 


sym9_148 


10 


210 


4368 


48736 


10 


28820 


18338 


28820 


18338 


25444 


9849 


11309 


9848 


LT 


65 


8.88 


2.25 


sys6-v0_144 


10 


15 


67 


358 


10 


254 


150 


255 


169 


322 


129 


144 


129 


L 


49 


0.03 


1.93 


urfl_149 


9 


11554 


57770 


462708 


9 


303374 


156424 


300962 


165226 


258230 


118005 


118010 


118005 


L 


61 


193.37 


2.04 


urf2_152 


8 


5030 


25150 


171284 


8 


115826 


61302 


115666 


65100 


115826 


50661 


50661 


50652 


LT 


56 


35.66 


2.01 


urf3_155 


10 


26468 


132340 


1282724 


10 


795496 


387346 


799448 


410372 


732684 


287016 


287017 


287012 


LT 


63 


831.05 


2.17 


urf5_158 


9 


10276 


51380 


442748 


9 


280948 


145478 


280052 


151332 


256748 


103823 


103822 


103820 


LT 


63 


99.52 


2.02 


urf6_160 


15 


10740 


53700 


951276 


15 


531768 


232354 


531664 


234208 


480772 


142604 


142603 


142600 


LT 


73 


302.61 


2.66 



N: Naive method (i.e.. synthesis, decomposition, SWAP insertion) T: With template matching M: With macros replacement G: With global reordering L: With local reordering 



Column Time denotes the overall run-time needed to generate the results for all possible configurations. 

Almost all results have been obtained in negligible run-time (i.e., in less than one CPU second). Only if template matching was enabled more run-time was needed. 



